
(CVPR 2018) Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser

Keyword [HGD]

Liao F, Liang M, Dong Y, et al. Defense against adversarial attacks using high-level representation guided denoiser[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1778-1787.

1. Overview

1.1. Motivation

  • small residual perturbation is amplified to a large magnitudein top layers of the models

In this paper, it proposed high-level representation guided denoiser (HGD) as a defense for image classification

  • more robust to white-box and black-box
  • trained on small subset of images and generalize well to other images and unseen classes
  • transfer to defend models other than the one guiding it

1.2.1. Attack Methods

  • box-constrained L-BFGS
  • FGSM

1.2.2. Defense Methods

  • augmentation with perturbation data (time consuming). even improve accuracy of clean image on some datasets, but not found on ImageNet
  • preprocessing
    • denoising auto-encoder, median filter, averaging filter, Gaussian low-pass filter, JPEG compression
    • two-step defense model. detect adversarial input, and then reform it based on the difference between the manifolds of clean and adversarial examples
  • gradient masking effect
    • deep contrastive network
    • knowledge distillation
    • saturating networks

2. Methods

2.1. Pixel Guided Denoiser (PGD)

2.2. High-level Representation Guided Denoiser (HGD)

  • Feature Guided Denoiser (FGD). l=-2 layer, unsupervised
  • logits guided denoiser (LGD). l=-1 layer, unsupervised
  • class label guided denoiser (CGD). supervised

3. Experiments

3.1. PGD

  • DAE performance significantly drops in clean images
  • denoising loss and classification accuracy of PGD are not so consistent

  • analyze the layer-wise perturbations of the target model activatedby PGD denoised images

  • LGD perturbation at the final layer is much lower than PGD and adversarial perturbations and close to random perturbation

3.2. HGD

  • HGD is more robust to white-box and black-box than PGD and ensV3
  • the difference between these HGD methods is insignificant
  • learning to denoise only is much easier than learning the coupled task of classification and defense

3.3. HGD as an Anti-adversarial Transformer

  • LGD does not suppress the total noise as PGD does, but adds more perturbations to the image

  1. *. adversarial perturbation
  2. ^. predicted perturbation
  • the slope of PGD‘s line < 1. PGD only removes a portion of the adversarial noises
  • the slope of LGD’s line > 1. the estimation is very noisy which leads to high pixel-level noise